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Background 

It is common practice to divide a digital system into software and hardware eomponents. The 
greatest funetionality and performanee oeeurs when funetions are plaeed in dedieated system 
hardware that is highly parallel and is optimized to perform the intended operations. More often 
than not it is far too expensive in time and money to create sueh dedieated hardware so the 
available hardware resourees are used to implement a eonventional proeessor whieh exeeutes 
programmable instruetions in a highly sequential manner. While this is very flexible sueh 
conventional proeessors are ineffieient — a large percentage of hardware resourees (up to 99% of 
the logie gates that make up its hardware) are idle most of the time. Commereially-off-the-Shelf 
digital and analog hardware now exists that ean change the “eireuit” that is being implemented 
long after the deviee(s) are fabrieated. This means that high speed hardware ean now share mueh 
of the flexibility normally reserved for software-only systems. 

Not all applieations need the added speed and other possible features of reeonfigurable systems 
sueh as fault toleranee and power management techniques, but some applications that could 
greatly benefit from this technology include, Genetic Algorithms, Real time processing. Data 
eneryption/deeryption ,RSA eryptography. Data Compression, Image and Video proeessing. 
String matehing. Heat and Laplaee equations, Newton's meehanies. Continuous System 
Simulation, Evolutionary Computing, Fault Tolerant Computing, Binary 2D eonvolution, 
Boltzmann maehine, 3D graphie Aeeelerators, Discrete Cosine Transforms, Irregular 
Mathematieal operations. For this reason, investigation into emerging teehniques and 
teehnologies in reeonfigurable systems is an aetive area of researeh whieh direetly supports the 
goals and objectives of NASA. 

Project Goals 

The foeus of this projeet was to survey the teehnology of reeonfigurable eomputing determine its 
level of maturity and suitability for NASA applieations. To better understand and assess the 
effectiveness of the reeonfigurable design paradigm that is utilized within the HAF-15 
reeonfigurable eomputer system. This system was made available to NASA MSEC for this 
purpose, from Star Bridge Systems, Inc. To implement on at least one applieation that would 
benefit from the performanee levels that are possible with reeonfigurable hardware. It was 
originally proposed that experiments in fault toleranee and dynamieally reeonfigurability would 
be perform but time eonstraints mandated that these be pursued as future researeh. 

Targeted Reeonfigurable Computing Platform 

The targeted reeonfigurable eomputer system that was used in this researeh was one that was 
developed by Star Bridge Systems, Ine. to utilizes deviees whieh are eommonly ealled Field 
Programmable Gate Arrays, FPGAs. FPGAs were introduced to the hardware design eommunity 
in the mid 1980s as a means to ineorporate so-ealled random logie into a minimal number of 
integrated eircuits. Sinee that time these ehips have grown drastieally in size and funetion where 
they have had a major effeet in the areas of rapid prototyping of low volume applieation speeifie 
designs, digital logie emulation, and have served greatly to improve the time-to-market for new 
product designs. Star Bridge Systems and other eompanies are trying to eapitalize not as a rapid 
prototyping deviee but as advaneed eomputing deviees. It should be noted that eurrent FPGA 
teehnology is not optimized for this use. 
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Targeted Reconfigurable Platform 

The reconfigurable system that served in this evaluation was the HAL- 15 Hypercomputer system 
from Star Bridge Systems, Inc. It has a peak performance estimate of 40 billion fixed point 16- 
bit multiply accumulates per second operations, and 15 billion 32-bit floating point operations 
per second. It utilizes a total of 10 Xilinx 4062 800 picosecond FPGAs where each 4062 FPGA 
has 2304 Combinational Logic Blocks which translates into somewhere between 62,000-130,000 
logic gate equivalents. It is PC hosted on a PCI 32 bit 33 Mhz daughter board. Systems can 
contain multiple daughter boards. The FPGAs are interconnected with the PC and each other 
using a proprietary bus structure. There is a total of 288 Mbytes of SDRAM and a 200 Mhz 
system clock. 

The development environment is the VIVA ™ Graphical User Interface. This environment 
supports a structural object oriented design philosophy that allows for the incorporation of 
certain high level behavioral attributes some of which are normally found in textural-based 
hardware description languages such as VHDL, Verilog and Handle C These include the 
concept of polymorphism where objects are created to support data types of multiple precession, 
operator overloading, where object with the same name can be utilized to handle different 
operations, and recursion where objects are defined in terms of themselves (note method of 
representation for hardware is novel). The major synchronization technique that is used closely 
follows the data flow paradigm where objects that are to be processed sequentially are designed 
in a pipelined manner where they must wait for the objects the precede it in the pipe to complete 
before it can begin operations. To accomplish this objects have special signals that performs the 
necessary handshaking protocol. The tool comes with a large library of high-end objects such as 
floating-point multipliers, dividers, adders, transcendental functions, etc. In many ways this 
environment falls somewhere between hardware description languages and schematic capture 
packages in terms of features and function. 

Application 1: Implementation of a Genetic Algorithm 

The first application that was implemented was a Genetic Algorithm that minimized a simple 
numeric objective function. Genetic Algorithms represent a set of search techniques, which are 
modeled after the evolutionary view of biological systems. They incorporate the ideas that an 
algorithm can be represented as a pattern that can be matted with other patters via the cross-over 
operations. A fitness function can then be applied to the resulting children to determine which of 
the children and parents continue over to the next generation. There is also a small probability of 
that the pattern will mutate thereby changing one or more of its elements. 

Figure 1 shows the high-level Viva diagram for the genetic algorithm. The design is hierarchical 
with this being the top level — there are a total of 44 user created objects in this design. Here we 
see the general structure of the design with two main storage elements. The first is the next 
generation storage where the children elements are stored and the associate memory which 
performs the combined operations of next generation storage and applying the fitness function. 
In this case the population size was set at a value of 16 members. Upon receiving the Go pulse 
the Generate lnit Population module randomly generates X, Y data and applies the objective 
function for all members of the population. It then stores the results in the next generation 
storage. When it is complete it signals the for loop object to begin operations. This object pulses 
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the next generation fitness controller, which moves the children elements into the population 
using the objective function as a key. The associative memory only maintains a copy of the best 
16 elements presented to it any elements that are worst than its worst element do no get stored 
and therefore do not make it to the next generation. After this is done the cross over mutate 
module gets pulsed. This will create 16 new children by randomly mating two elements from the 
population in the associative memory and then storing them in the next generation storage. When 
this is complete the outer level controlling for loop is signaled that the iteration is complete. 
When all iterations are complete the associative memory is signaled to output the best member of 
the population. 



Figure 1; High-Level Viva Representation of the Genetic Algorithm 

After the design methodology was fully explored the genetic algorithm was implemented in a 
couple of weeks. The algorithm occupied less than one of the 10 FPGAs and produced a result, 
which was an order of magnitude faster than an equivalent software implementation. The 
implementation utilized only integer arithmetic but an equivalent fixed or floating-point 
representation could be made by modifying the objective function and a small number of objects. 

Application 2: Continuous System Simulation 

The second application that was applied was a continuous simulation of a small system of 
differential equations that form the classical initial value problem. This utilized floating-point 
arithmetic. A simple Euler integrator module was created that performed the multiply 
accumulate operation. Figure 2 shows how these were linked to implement the sin function using 
a simple second order ODE. 
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Figure 2: Continuous Simulation of a second order ODE that generates the sin(t) 

Continuous simulation seems especially suitable for reconfigurable hardware implementation 
that has the flexibility to support high-speed operations for arbitrary structured dynamic systems. 
It directly supports the low level of granularity that is present in most continuous simulations and 
allows for efficient interpretation communication between object modules —something that is too 
costly with conventional parallel processing technology. There are still major space time trade- 
offs need to be considered. How to model nonlinear constructs such as large lookup tables and 
transcendental are also a major consideration. 

Conclusion 

This project successfully demonstrated how two types of applications can be implemented on the 
HAL- 15 reconfigurable computer using the Viva design entry language. Viva is competing with 
schematic capture traditional HDL and other high level languages that are being modified to 
create hardware. There is a high learning curve associated with these languages. Viva is designed 
to minimize this learning curve. This investigator is probably biased since he has already learned 
these tools — his evaluation is that Viva is closer to schematic capture than HDLs or high-level 
language derivatives. He also feels that Viva will not replace existing digital design 
methodologies since they offer a wider amount of control and functionality which is sometime 
needed. The use of Viva may fill an important need where algorithm developers who are not 
digital designers can evaluate the effectiveness of their designs or accelerate their calculations 
without the high learning curve associated with other techniques. This is analogous to the use of 
specialized graphical object oriented software in domains ranging from discrete event simulation, 
system architecture description languages to probabilistic analysis. This is just initial research — 
more applications will have to be explored before definitive conclusions are reached. Such future 
research should include more advance Genetic Algorithms, Continuous Simulation, Neural 
Networks. Also investigations into dynamic reconfiguration techniques, fault tolerance, 
evolutionary computing and intelligent power management should be explored. 
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